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DECODING FACSIMILE DATA FROM THE RAPICOM 450 
Introduction 


This note describes the implementation of a program to decode 
facsimile data from the Rapicom 450 facsimile (fax) machine into an 
ordinary bitmap. This bitmap can then be displayed on other devices 
or edited and then encoded back into the Rapicom 450 format. In 
order to do this, it was necessary to understand the how the 
encoding/decoding process works within the fax machine and to 
duplicate that process ina program. This algorithm is descibed in 
an article by Weber [1] as well as in a memo by Mills [2], however, 
more information than is presented in these papers is necessary to 
successfully decode the data. 


The program was written in L10 as a subsystem of NLS running on 
TOPS20. The fax machine is interfaced to TOPS20 as a terminal 
through a microprocessor-based interface called FAXIE. 


Grateful acknowledgment is made to Steve Treadwell of University 
College, London and Jon Postel of Information Sciences Institute for 
their assistance. 


Interface to TOPS20 


The fax machine is connected to a microprocessor-based unit called 
FAXIE, designed and built by Steve Casner and Bob Parker. More 
detailed information can be found in reference [3]. FAXIE is 
connected to TOPS20 over a terminal line, and a program was written 
to read data over this line and store it in a file. The decoding 
program reads the fax data from this file. 


The data comes from the fax machine serially. FAXIE reads this data 
into an 8-bit shift register and sends the 8-bit byte (octet) over 
the terminal line. Since the fax machine assigns MARK to logical 0’s 
and SPACE to logical 1’s (which is backward from RS232), FAXIE 
complements each bit in the octet. The data is sent to TOPS20 in 
octets, the most significant bit first. If you read each octet from 
most significant bit to least significant bit in the order FAXIE 
sends the data to TOPS20, you would be reading the data in the same 
order in comes into FAXIE from the fax machine. 


The standard for storing Rapicom 450 Facsimile Data is described in 


RFC 769 [4]. According to this standard, each octet coming from 
FAXIE must be complemented and inverted (i.e. invert the order of the 
bits in the octet). Thus, the receiving program did this before 
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storing the data ina file. When the decoding program reads this 
file, it must invert and complement each octet before reading the 
data. 

Each data block from the fax machine is 585 bits long. The end of 


this data is padded with 7 0’s to make 592 bits or 74 octets. 
According to RFC 769, this data is stored ina file preceded by a 
length octet and a command octet. The possible commands are: 


56 (70 octal)--This is a Set-Up block (the first block of the 
file, contains information about the fax image) 


57 (71 octal)--This is a data block (the rest of the blocks in the 
file except for the last one) 


58 (72 octal)--End command (the last block of the file) 


The length field tells how many octets in this block and is always 76 
(114 octal) except for the END command which can be 2 (no data). The 
length and command octets are NOT inverted and complemented. 


Below is a diagram of each block in the file: 


III. The Rapicom 450 Encoding Algorithm 


An ordinary 8 1/2" by 11" document is made up of about 2100 scan 
lines, each line has 1726 pels (picture elements) in it. Each pel 
can be either black (1) or white (0). 


The Rapicom 450 has three picture quality modes. In fine detail 
mode, all of the document is encoded. In quality mode only every 
other scan line is encoded and it is intended that these missing 
lines are filled in on playback by replicating the previous line. 
There is also express mode, where only every third line is encoded. 
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Data is encoded two lines at a time, using a special two dimensional 
run-length encoding scheme. There are 1726 pels on top and 1726 pels 
on the bottom. Each pair (top-bottom) of pels is called a column. 
For each of the 1726 columns you can have any one of four 
configurations (called states): 


column 
(top-bottom) pels state 

W-W 0,0 0 

W-B 0,1 1 

B-W 1,0 2 

B-B 1,1 3 
The encoding algorithm can be described in terms of a 
non-deterministic finite-state automaton shown in Fig. 1 (after Mills 
[2]). You start out in a state (0-3) and transform to another state 
by emitting the appropriate bits marked along the arcs of the 
diagram. For example, suppose you are in state 1 (WB). To go to 


state 2 (BW), you would output the bits 101 (binary); to go to state 
0 (WW) you would output the bits 1000. Note that the number of bits 
on each transition is variable. 


In states 0 (WW) and 3 (BB), a special run length encoding scheme is 


used. There are two state variables associated with each of these 
states. One variable is a run-length counter and the other is the 
field length (in bits) of this counter. Upon entry to either of 


these two states, the counter is initialized to zero and is 
incremented for every additional column of the same state. At the 
end of the run, this counter is transmitted, extending with high 


order zeros if necessary. If the count fills up the field, it is 
transmitted, the field length is incremented by one, and the count 
starts again. This count is called the run length word and it is 


between 2 and 7 bits long. 


For example, suppose we are in state 0 (WW) and the run length for 
this state (refered to as the white run length) is 3. Suppose there 
are three 0’s in a row. The first 0 was encoded when we came to this 
state, there are two more 0’s that must be encoded. Thus we would 
send a 010 (binary). Similarly, if there are seven 0’s in a row, we 
would send a 110, but eight 0's would be sent by 111 followed by 0000 
and the white run length becomes 4. (Ten 0’s would be encoded as 111 
followed by 0010 and the white run length would be 4). 
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Non-deterministic finite-state machine diagram for RAPICOM 450 
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Run length word lengths must be between 2 and 7. The field length is 
decremented if the run is encoded in one word and: 


1. If the run length is 3 and the highest order bit is 0. 


2. Or, if the run length is 4, 5, 6, or 7 and the highest order 2 
bits are 0. 


In addition to all this, there is a special rule to follow if the run 
occupies at least two run words (and can involve incrementing the run 
word size) andthe run ends exactly at the end of a scan line. In 
this case, the last word of the run is tested for decrement as if the 
previous words in the run did not exist. 


An Example: 


To confirm the reader’s understanding of the encoding procedure, 


suppose we had the following portion of a document (1l=black, 
O=white): 

top row: D T 1.1 P0020. OZ ZK OO 0 

bottom row: 1111000000001 00 

state: 1333320000022100 


Suppose also that the black run field length is 2, the white run 
length is 3, and the state is 1. (This example comes from 
reference [1].) 
This portion would be encoded as: 

1 1011 11 000 1 0100 100 1 0 010 1000 
NOTE: It turns out that the Rapicom 450 sends the bits of a field 
in reverse order. This will be discussed in the section V. 
However, since each run length field is sent reversed, the above 
encoded bit pattern would actually be sent as: 


1 1011 11 000 1 0100 001 1 0 010 1000 


|-this is actually 100 reversed 
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IV. 


Another Example: 


This example illustrates the rule for decrementing the run length 
word lengths: 


top row: 01 T 0-0-1 41 1: 11:00 
bottom row: 111110111110 
state: LL DL NO: 


Here, let us suppose that the black run field length is now 4, the 
white is still 3, and the state is 1. 


This portion would be encoded as: 
1 1011 0001 1 1 101 0111 011 1 1000 


|-goes to 3 |-b1k cnt goes to 2 


When we reverse the order of the run fields, the bit pattern that 
is actually sent is: 


1 1011 1000 1 1 101 0111 110 1 1000 
|-this is actually 0001 reversed, etc. 
The Setup Block and the Data Header 


Each data block from the fax machine is 585 bits long. The number of 
blocks ina picture is variable and depends on the size and 
characteristics of the picture. It should be emphasized that a block 
can end in the middle of a scan line of the document. There can in 
fact be many scan lines in a block. 


The 585 bit data block is composed of a 24 bit sync code which is 
used to recognize the beginning of a block, a 37 bit header, 512 bits 
of actual data, and a 12 bit CRC checksum: 


| 24-bit | 37-bit | 512-bit | 12-bit | 
[sync code | header | data | checksum | 


The number of useful data bits is variable and can be between 0 and 
512 (although there are always 512 bits there, some of them are to be 
ignored). The number of data bits to be used is given in the header. 
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The 37 bits of header is composed of: 


| 2-bit |5-bit| 10-bit | 12-bit | 3-bit | 3-bit |2-bit| 
|seq num|flags|data count| x position|black size|white size|state| 


An explanation of these fields follows: 


IMPORTANT NOTE: Most (but not all) of these fields are sent by 
the fax machine in REVERSE ORDER. The order of each n-bit field 


must be inverted. 


Sync code 


This is used to synchronize on each block. The value of this 
24 bit field is 30474730 octal (not reversed). 


Sequence number 


This number cycles through 0, 1, 2, 3 for the data blocks. It 
is 0 for the Set-Up block (not reversed). 


Flags 


Each of these flags are 1 bit wide: 


Run 


Purpose unknown, it always seems to be 1. 
Cofb 

Purpose unknown, it always seems to be 0. 
Rpt 


1 for Set-Up blocks (which are repeated when coming from 
the fax machine though only one of them is transfered by 
FAXIE to TOPS20 and stored in the file) and 0 for data 


blocks. 
Spare 


Purpose unknown, doesn't matter. 
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Sub 
1 if this is a Set-Up block. 
Data Count 


Number of useful bits to use out of the 512 data bits. NOT ALL 
of the 512 data bits are used, only this number of them. This 
number can be 0 (usually in one of the first data blocks) which 
means to throw away this block. (This field is reversed!) 


X Position 


Current position on the scan line, a value between 0 and 1725. 
If this number is greater than where the previous block left 
off, the intervening space should be filled with white (0’s). 
If this number is less than where the previous block left off, 
set the X position to this value and replace the overlapped 
data with the new data from this block. If this number is 
greater than 1726, ignore this field and continue from where 
the previous block left off. (This field is reversed!) 


Black Size 
The size of the black run length field, must be between 2 and 
Pes This is the correct value for the black size. It may 
differ from what was found at the end of the previous block. 
(This field is reversed!) 

White Size 
The size of the white run length field, must be between 2 and 
7. It may differ from what was found at the end of the 
previous block. (This field is reversed!) 

State 
The current state. This is the correct state. It may differ 
from the state at the end of the previous block. (This field is 
not reversed.) 


Data 


512 bits of the actual encoding of the document. NOT ALL of 
this data is used in general, only the amount specified by the 
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data count. If this is a set up block, the data contains 
information about the type of document (see below). 


Checksum 


CRC checksum on the entire block. Uses polynomial 
EO EAE EA o: F3 EN, 


In a setup block, the data portion of the data block consists of: 


5-bit 1-bit 20-bits 480-bits 
spare multi page of zeros 1’s and 0's 


Specifically these are: 
6 flags (each are 1 bit): 
Start bit 
Always 0. 
Speed 
Is 1 if express mode. 
Detail 


Is 1 if detail mode. (NOTE: If the Detail and Speed flags 
are both 0, then data is in Quality mode). 


14 inch paper 
is 1 if 14 inch paper length. 
5.5 inch paper 


is 1 if 5.5 inch paper length. (NOTE: If the 14 inch and 5 
inch flags are both 0, then paper length is 11 inch). 


paper present 


is 1 if paper is present at scanner (should be always 1). 
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Spare: 

These 5 bits can be any value. 
Multi-page: 

1 if multi page mode 
Rest of data of set-up block: 


The above fields are followed by twenty 0 bits and the rest of 
the 512 bits of the block are alternating 0’s and 1’s. 


There are a number of important points to be remembered in regard to 


the header of a data block. First of all, the data count, the 
x-position, and the black and white run sizes must be read IN REVERSE 
ORDER. The reason for this is that the fax machine sends these bits 


in reverse order. However, the sequence number and the state fields 
ARE NOT REVERSED. In addition to this, each run field in the data IS 
REVERSED. This reversing of the bits in each n-bit field is 
completely separate from the reversing and complementing of each 
octet mentioned earlier. 


Second, only the first n bits, where n is the value of the data count 
field (remember its reversed!), of the data is valid, the rest is to 
be ignored. If n is zero, the whole block is to be ignored. 


Third, if the x position is beyond where the last block ended, fill 
the space between where the last block ended andthe current x 


postion with white (0’s). If the x postition is less then where the 
last block ended, replace the overlapped data with the data in the 
new block. If the x postition is greater than 1726, ignore it and 


continue from where the previous block left off. 
Fourth, the black size, white size (reversed), and state (not 
reversed!) given in the header are the correct values even if they 


disagree with the end of the previous block. 


Finally, the sequence number (not reversed) should count through 
0,1,2,3. If it does not, a block is missing. 
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The Decoding Algorithm 


Upon first glance at the finite state diagram in Figure 1, it may 
seem that it would be difficult to create a decoding procedure. For 
example, if you are in the WW state, and the next bit is a 1, how do 
you know whether to do a transition to WB or BW? The answer to this 
is to recognize that every arc out of the BW state begins with 0 and 


every arc out of WB begins with 1. Thus, if you are in the WW state, 
and the next bit is 1, followed by a 0, you know to go to the BW 
state. If the next bit is 1, followed by a 1, you know to go to the 
WB state. 


In writing the decoding program it was necessary to have two ways of 
reading the next bit in the data stream. The first way reads the bit 


and "consumes" it, i.e. increments the bit pointer to point at the 
next bit. The other way does not "consume" it. Below are four 
statements which show how to decode fax data. The numbers in 


parentheses are not to be consumed, that is to say they will be read 
again in making the next transition. 


If I am in state BW (2) and the next bits are: 


O (0): go to BW 
0111: go to BB 
010 (1): go to WB 
0100: go to WW 


If I am in state WB (1) and the next bits are: 


ORKO go to WB 
1000: go to WW 
101 (0): go to BW 
1011: go to BB 


If I am in state WW (0), then first go through the run length 
algorithm, then if the next bits are: 


Os go to BB 
1 (0): go to BW 
byes go to WB 


If I am in state BB (3), then first go through the run length 
algorithm, then if the next bits are: 


O: go to WW 
1 (0): go to BW 
BORRO go to WB 


For the run length algorithm, remember, look at the next n bits, 
where n is the length of either the black or white run length 
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word, REVERSE the bits, and output that many BB's or WW’s 
(depending on whether black or white run). If the field is full, 
increment the size of the word, and get that many bits more, i.e. 
get the next n+1 bits, etc. Also, the run length word length can 
be decremented according to the rules given in section III. 


You always go through the run length even if there is only one WW 
or BB, in this case, the run field will be 0. 


Let us look at the first example given in section III. Suppose we 
want to decode the bits: 110111100010100100100101000... (we have 
already reversed the run lengths to make things easier). 


We are in state 1 (WB) andthe black run length word length is 2 
and the white length is 3. We get these initial values either 
from the block header, or by remembering them from the previous 
transitions if this is not the start of the block. According to 
our rules, we would parse this string as follows: 


1(1) 1011 11 000 1(0) 0100 100 1(0) 0(0) 010(1) 1000... 


The numbers in parentheses are numbers that were read but not 
"consumed", thus the next number in the sequence is the same as 
the one in parentheses. First, we see a 1 and that the next bit 
is a 1, this means that we go to WB. Then we have a 1011 which 
means to go to BB. Then we do a run, we have a 11 followed by a 
000 which means the black run length gets incremented by 1 (it is 
now 3) and we get 3 MORE BB's. Now we have a1 followed by 0 
which means go to BW. Next a 0100 which is WW. Then we have a 
run, 100, which means four more WW’s. We keep going like this and 
we get the original bit pattern given in the first example of 
section III. 


It is important to always start fresh when dealing with each 
block. There are many reasons for this. The first is that 
sometimes blocks are dropped, and you can recover from this if you 
resynchronize at the start of each block. Also, if at the end of 
the previous block, there is about to be a transition, instead of 
making it at the beginning of the next block, the fax machine 
gives the new state in the header of the next block and goes from 
there. Thus it is important to always start at whatever state is 
given in the header, and to align yourself at the current X 
position given there also. 


Sometimes, while decoding a block, a bit pattern will occur which 
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does not correspond to any transition. If this happens, the rest 
of the block may be bad and should be discarded. 


The decoding program decodes the fax data block by block until it 
comes to an END command in the data, or runs out of data. 


Program Performance 


The L10 NLS program takes about two CPU minutes to run on TOPS20 on a 
DEC KL10 to decode the average document in fine detail mode. In this 
mode, the picture is about 1726 by 2100 pels, and takes about 204 
disk pages to store. 


We have a program which displays bit maps on an HP graphics terminal 
and have been able to display portions of documents. (not all of an 
8.5" by 11" document will fit in the display). We can use the 
terminal’s zoom capability to look at very small portions of the 
document. 
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In this appendix is given the first portion of the data which comes 
from the fax machine, this same data in RFC 769 format, and some of 
this data decoded into a bitmap. The data is represented in octal 
octets. 


The following is data of the form which comes out of the fax machine 
with length and command octets added: 


114 70 142 171 330 13 377 377 377 371 53 200 0 5 125 125 
125° 129-125 "125125. 125-125 1239125: 120 125 123-1239 125 125.125 
125:125. 125-125 125:125 425: 125 125125 125:125 125125 125,125 
12597125 125° 125 1259425 125° 12571257125 125° 125 125) 125 125 125 
125 125 125 125 125 125 125 125 125 121 21 261 114 71 142 171 
330 40 O 102 326 270 152 42 42 44 111 0 42 151 267 122 
366: 110: 237-102: 211 365-111 17/1336 51244. 247 377 317 111 3:62 
LTP ISTE SAR BEE BIE SLT BLE BIE SITE ZA GR, LOA ZIA 24d. AL 11d 377 
TILI 332 377 SUE SILL STE 311 SET GE GA 317 ST 347 163) 3.01, 361 
ITE SIT 349 SITT 360) LIT L2 O 114 71 142 171 330 141 137 177 
377 344 10 O 160 23 301 160 137 376 204 352 135 27 353 264 


O 70 100 7 20 75 0 0 0 0 0 344 0 0 0 0 
0 0 0 0 34 275 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 7 41 310 34 200 0 0 344 0 0 o 71 
13 331 204 O 114 “71 142 171 330 241 137 26 302 160 o 16 
100 71 0 370 270 271 0 162 O 71 174 134 100 162 0 34 
234 200 344 7 156 100 1 310 16 107 43 323 263 220 365 313 


327 57 377 325 331 36 56 47 325 324 344 3227 40 "TL. ZO 

200 T 310 1 313 220 0 0 7 241 330 0 0 137 342 200 
114 “71 142 171 330 340 77 40 142 160 0 0 0 0 162 71 
73 162 376 276 234 277 376 67 265 301 16 20 171 d 311-313 
346 377 321 75 256 113 245 377 262 160 136 247 13 251 350 374 

270 236 235 217 136 203 220 75 166 166 364 177 305 366 72 107 
63 330 352 345 313 320 71 34 270 46 57 0 


The following is the same data after put into RFC 769 format (with 
each data octet reversed and complemented) : 


114 70 271 141 344 57 0 0 0 140 53 376 377 137 125 125 
125 125 125 "1251125 125. 125-125) 125-125 125. 120 12392125: 125° 125 
125 125 125 125 125 125 125 125 125 125 125 125 125 125 125 125 
125-125: 125 125, 125 125-125 125: 125 125-120 125: 125-125: 125125 
125 125 125 125 125 125 125 125 125 165 167 162 114 71 271 141 
344 373 377 275 224 342 251 273 273 333 155 377 273 151 22 265 
220 355 6 275 156 120 155 141 204 153 332 32 0 0 155 260 

1 0 0 0 0 0 0 0 0 200 335 56 172 173 155 0 
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£55 


377 
377 
377 
H; 
375 
306 
24 
376 
114 
43 
230 
342 
63 


4 

0 
330 
343 
377 
377 
144 
143 
376 
13 
177 
PA 
261 
0 
206 
344 


0 

0 
357 
375 
377 
377 
336 
377 
330 

0 
354 
271 
200 
164 
106 
250 


0 

0 
377 
37 
377 
377 
377 
340 
37 
124 
177 
141 
202 
103 
16 
130 


0 
360 
361 
367 
307 

37 
114 
342 
211 
144 

54 
344 
306 
212 
205 

54 


ahs) 
76 
364 


The following is the first 


177 
377 
377 
377 
377 
377 
337 
377 
E 
377 
377 
377 
377 
377 
374 
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377 
377 
317 
377 
31.1 
377 
317 
377 
377 
377 
377 
377 
377 
377 
GIA 


377 
377 
377 
377 
377 
37:7 
377 
377 
377 
377 
377 
377 
377 
377 
356 


377 
377 
377 
377 
3V 
377 
377 
377 
377 
377 
377 
377 
377 
EO 
377 


377 
377 
377 
377 
377 
377 
377 
377 
E 
377 
377 
377 
377 
377 
177 


377 
377 
377 
377 
377 
377 
BAH 
377 
377 
377 
377 
377 
377 
377 


0 
257 
174 
377 
377 
354 
271 
377 
177 
213 
377 

3 
200 
132 
366 
143 


part of 
(there are about 4 scan lines here, 


377 
Sf 
EO 
367 
377 
37:7 
377 
377 
Eb 
377 
377 
377 
377 
377 

10 


377 
373 
23 
0 
103 
307 


377 
377 
377 
314 
3V 
377 
SA 
377 
377 
377 
377 
377 
377 
374 
0 
140 
10 
100 
377 
0 
372 
345 
300 
167 
67 
377 


0 
71 
200 
377 
377 
377 
172 
143 
35 
324 
172 
361 
174 
361 
221 
233 


0 
271 
336 
377 
377 
377 

J 
301 

73 
330 
344 
377 
217 
205 
320 

13 


141 
250 
330 
377 
330 
227 
305 
64 
77 
377 
377 
367 
32 
d 
377 


expanded bitmap 
2 pairs of scan 


377 
377 
ST 
377 
377 
377 
317 
377 
377 
377 
377 
377 
377 

4 
200 


377 
377 
377 
377 
377 
37:7 
377 
377 
377 
377 
377 
377 
377 
327 


377 
377 
377 
377 
3V 
377 
377 
377 
317 
377 
377 
377 
377 
377 


377 
377 
377 
377 
377 
377 
377 
377 
SF 
377 
377 
377 
377 
377 
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61 174 160 
171 5 d 
27 50 322 


of this data 


0 100 0 

0 0 0 

0 100 0 

0 20 21 
200 216 0 
47 137 336 
317 377 237 
6 347 143 
343 73 334 
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